Static analysis of numerical properties in the presence of pointers. (Analyse statique de propriétés numériques en présence de pointeurs)

نویسنده

  • Zhoulai Fu
چکیده

interpretation, introduced in the late 1970’s [24] by P. Cousot and R. Cousot, is a framework of semantics approximation. We briefly review the major terminology of this theory. Informally, abstract interpretation aims to construct two different meanings for a programming language where the first gives the usual meaning of programs in the language, and the second can be used to answer certain questions about the runtime behavior of programs in the language. The standard meaning of programs, called concrete semantics, can be typically described by their input-output function, and the standard interpretation will then be a function which maps programs to their input-output functions. The abstract meaning, called abstract semantics will be defined by a function which maps programs to mathematical objects of a particular universe, called abstract semantics domain. Mathematically, the semantics of a program P can often be expressed by a least fixpoint lfp t r|P |s that is the least solution to a constraint system X = t r|P |s pXq computed on a complete lattice. Example 2.3.1. Consider a loop that increments the value of x: 1 x=0; 22 te l-0 09 18 59 3, v er si on 1 13 D ec 2 01 3 CHAPTER 2. BACKGROUND 2 while (x<10){ 3 x = x+1; 4 } To infer possible values of x before each program point (from 1 to 5), we can construct the following constraint system. X1 = H X2 Ě t0u YX4 X3 Ě X2 X p−8, 10q X4 Ě tx+ 1|x P X4u X5 Ě X2 X r10,8q The analysis of this problem amounts to solving the least fixpoint of the constraints system on the domain of Πi=1pXi Ñ Intvq, in which Intv is the set of intervals. The soundness of the abstract semantics is described using a concretization function γ : A7 Ñ A5, giving the meaning of the abstract elements in terms of concrete elements. We say that the abstract semantics lfpt7 r|P |s is sound with respect to the concrete semantics lfpt5 r|P |s, or say that the latter is approximated by the former, if lfpt5 r|P |s Ď5 γplfpt7 r|P |s). In this paper, we frequently verify a stronger soundness condition in the form of t5 r|P |s ̋ γ Ď5 γ ̋ t7 r|P |s (2.1) By “being sound”, we always refer to partial soundness, i.e., if P terminates, then (2.1) holds. We introduce the concept of Galois connection. Definition 2.3.1 (Galois connection). Consider two posets pA5,Ď5q and pA7,Ď7q. If functions α : A5 Ñ A7 and γ : A7 Ñ A5 satisfy, for each a5 P A5 and a7 P A7, a5 Ď5 γpa7q iff. αpa5q Ď7 a7 then the quadruple pA5, α, γ, A7q is called a Galois connection. 23 te l-0 09 18 59 3, v er si on 1 13 D ec 2 01 3 CHAPTER 2. BACKGROUND In terms of abstract interpretation, the sets A5, A7 are often called concrete domain and abstract domain respectively, and the functions t5 P A5 Ñ A5, t7 P A7 Ñ A7 are called (global) concrete transfer function and (global) abstract transfer function. Example 2.3.2 (Cartesian abstraction). Given 2 posets A and B, then we have the Galois connection p℘pAˆBq, αˆ, γˆ, ℘pAq ˆ ℘pBqq where αˆ fi λR.ppostrfsts R, postrsnds Rq (2.2) γˆ fi λpA0, B0q.A0 ˆB0 (2.3) Example 2.3.3 (Composition of Galois connections). Given 2 Galois connections pA5, α1, γ1, A6q and pA6, α2, γ2, A7q, then pA5, α3, γ3, A7q is also a Galois connection, with α3 fi α2 ̋ α1 and γ3 fi γ1 ̋ γ2. Theorem 2.3.1 (Approximation of Fixpoint [25]). Given two complete lattices pA5,Ď5q and pA7,Ď7q and the Galois Connection pA5, α, γ, A7q. Let t5 and t7 be monotonic functions defined respectively on A5 and A7. If the condition t5 ̋ γ Ď5 γ ̋ t7 holds, then we have an approximation of the least fix point of t5 by the least fix point of t7: lfpt5 Ď γplfpt7q The computation of lfpt7 is problem-dependent: if the iterates t7kpK7q for k = 0, 1 . . . , started from some initial K7 become eventually stable (A7 is said to enjoy the ascending chain condition), then lfpt7 can be computed using brute force. This is a typical case for data-flow analysis. In case that the iterates converge slowly or do not converge, the algorithm to compute the fix point of t7 may involve an extrapolation strategy. In [24], Cousot introduced an operator called widening to guarantee fast termination of fix point computation. Definition 2.3.2. A widening 5 is an operator of type A7 ˆ A7 Ñ A7 such that @a1, a 7 2 P A7 : a 7 1 Ď 7 a15 a 7 2 ^ a 7 2 Ď 7 a15 a 7 2 24 te l-0 09 18 59 3, v er si on 1 13 D ec 2 01 3 CHAPTER 2. BACKGROUND and for all increasing chains a0 Ď7 a 7 1 Ď7 . . ., the increasing chain defined by w0 = a 7 0, w1 = w05 a 7 1 . . . wi+1 = wi5 a 7 i+1 is not strictly increasing. Theorem 2.3.2 (Kleene iteration with widening [24]). The following iteration sequence X0 = K7 Xi+1 = # Xi, if t pXiq Ď7 Xi Xi5 tpXiq otherwise is ultimately stationary and its limit is a post-fixpoint for t7. 2.4 Static Numerical Analysis The target language of this static analysis is WHILEn. The tracked information is called numerical properties. We distinguish two kinds: • Global numerical properties refer to properties related to the whole program, including program execution time, consumed memories. An example is the static worst-case execution time (WCET) analysis. It is remarkably difficult to determine tight WCET bounds due to hardware complications and architectural features like instruction pipelines. A well-known WCET analyzer is aiT by AbsInt. • Local numerical properties are those associated with program identifiers, in particular program variables. This category of analysis is demanded for the automatic detection of some well-known run-time errors like division by zero or array index out of bound. The algorithm developed by Karr in 1976 computes for each program control point the affine relations that hold among the program variables whenever the control point is reached [44]. An affine relation is a property of the form 1 http://www.absint.com 25 te l-0 09 18 59 3, v er si on 1 13 D ec 2 01 3 CHAPTER 2. BACKGROUND Σi=1ci xi = c where xi are program variables and ci, c are constant. In 1978, Cousot and Halbwachs [26] presented an eminent generalization of Karr’s approach. They introduced the theory of abstract interpretation, and brought the designing of various numerical abstract domains into the mainstream. By using polyhedra instead of affine relations as space of approximation, their analysis allows us to specify programs with affine inequalities Σi=1ci xi ď c. This thesis considers the second category of numerical properties. We use the term numerical property, for any conjunction of formulae in some decidable theory of arithmetic. A numerical property can be loosely seen as a geometric shape. For example, the numerical property tx + y ď 1, x ď 0, y ď 0u is composed of the conjunction of three arithmetic formulae, representing a quart of the unit disc. Each formula of a numerical property is assumed to be quantifier-free. The constant values in the formula are integers. Certain classes of numerical properties with a uniform geometric feature are called abstract numerical domains. The “interval”, “octagon”, or “polyhedral” abstract domains are thus named after their represented geometric shapes. In this paper, an abstract numerical domain is considered as a subset of the universe of numerical properties. As usual, an environment is a partial mapping from program variables to their associated values. In our context, we consider numerical environment of integer values, Num fi Varn Ñ ZK where Varn is the set of scalar variables holding numerical values. The relationship between a numerical environment n and a numerical property n̄ is formalized by the concept of valuation. We say that n is a valuation of n̄, denoted by n |ù n̄, if n̄ becomes a tautology after each of its free variables, if any, has been replaced by its corresponding value in n. Definition 2.4.1 (Interface of the traditional numerical analyzer). pWHILEn, ℘pNumq, r| ̈|sn , γn,Num 7, r| ̈|snq 26 te l-0 09 18 59 3, v er si on 1 13 D ec 2 01 3 CHAPTER 2. BACKGROUND The concrete numerical domain and the abstract numerical domain for the language WHILEn are respectively ℘pNumq and Num7. They are related by the concretization function γn : Num 7 Ñ ℘pNumq defined by γnpn̄q = tn P Num | n |ù n̄u (2.4) The partial order Ď is consistent with the monotonicity of γn, i.e., n̄1 Ď n̄2 implies γnpn̄1q Ď γnpn̄2q. For each statement sn of WHILEn, the concrete semantics r|sn|sn is assumed to be the powerset lifting r|sn|s 6 n fi postr Num ÝÑ psnqs of some standard operational semantics: Num ÝÑ: WHILEn Ñ ℘pNum ˆ Numq (2.5) The abstract semantics r| ̈|sn satisfies the soundness condition: r| ̈|sn ̋ γn Ď γn ̋ r| ̈|s 7 n (2.6) At last, we assume the availability of a join operator \ and a widening operator 5. The join operator is assumed to be sound with regard to the partial order Ď, and 5 is assumed to be sound as specified in Sect. 4 of [23]. 2.5 Points-to Analysis The imperative language WHILEp provides basic pointer operations like dynamic allocation, pointer assignments, field store and field load. Classical store-based semantics models the memory as the environment and the store. Roughly speaking, variable assignment modifies the environment and the store is modified by indirect access of memory. The environment is most commonly thought of as a partial mapping from program variables to locations, and the store is specified by a partial mapping from locations to values. Conventionally, the model also needs to know the usage status of allocated locations. Each state of the store-based semantic domain used in this thesis is assumed to be garbage-free, namely, each allocated location is reachable in a sense that we shall make precise below. The points-to analysis [30] is a dataflow analysis widely used for the static pointer analysis. The essential idea of points-to analysis is to partition 27 te l-0 09 18 59 3, v er si on 1 13 D ec 2 01 3 CHAPTER 2. BACKGROUND the concrete memory references Ref into a finite set of abstract references H, and then summarize the run-time pointer relations via elements of H and program variables. The result of the analysis is often expressed by a graph-like structure, called points-to graph. The memory partition process mentioned above is sometimes called a naming scheme. A popular naming scheme, known as k-CFA [63], is based on the k most recent call sites on the stack of the thread creating the object. Pointer analyses have been surveyed by numerous authors [37, 57, 59]. The 5-page survey of Hind and Pioli [41] is mostly cited; different axes balancing between efficiency and effectiveness are identified, with so called equality-based [66], subset-based [1] and flowsensitive [18] variations. several directions for the thenfuture research are also discussed: How to improve the efficiency without affecting scalability or vice-versa, how to design an analysis for a client’s needs, are flow-sensitive or context-sensitive analyses worth more investigation, which heap modeling shall we choose, etc. For type-safe languages like Java, the flow-insensitive analysis is of polynomial complexity [13], but the analysis is difficult in general. The NPhardness of a flow-insensitive analysis is shown by Horwitz [42] for programs without dynamic memory allocation and when all the variables are scalars and arbitrary number of dereferencing is allowed. Many techniques have been proposed to optimize points-to analyses. The online cycle elimination of Fahndrich et al. [32] represents points-to analysis as a graph problem and collapse cycles into single nodes since each element of the cycle has the same points-to information. Lazy Cycle Detection proposed by Hardekopt and Lin [38, 39] find the cycles using heuristics, so that the complexity overhead of Fahndrich can be greatly reduced. Another dimension that improves points-to analysis is by using efficient data structures. In particular, BDD[11] was found to be much more spaceefficient than traditional storage of points-to information[75]. This finding was then exploited by Berndl et al. [4] and Whaley and Lam [73] for efficient points-to analysis algorithms using BDDs for Java. The challenge of points-to analysis, as in other static analysis, is to improve the precision of analysis without sacrificing the scalability. Lhotak and Chung [49] propose a Strong Update analysis combining both features: it is efficient like flow-insensitive analysis, with the same worst-case bounds, 28 te l-0 09 18 59 3, v er si on 1 13 D ec 2 01 3 CHAPTER 2. BACKGROUND yet its precision benefits from strong updates like flow-sensitive analysis. The key insight is that strong updates are applicable when the dereferenced points-to set is a singleton, and a singleton set is cheap to analyze. Hence the analysis focuses on flow sensitivity on singleton sets. Larger sets, which will not lead to strong updates, are modeled flow insensitively to maintain efficiency. De and D’Souza [27] propose to represent points-to information as maps, rather than points-to graph from access paths to sets of abstract objects.Their approach is similar to the classic k-limiting approach which truncate analysis targets by a predefined bound k: Their method finally leads to a flow-sensitive pointer analysis algorithm for Java that can perform strong updates on heap-based pointers. Recently, Khedker, Mycroft and Rawat [45] propose a lazy points-to analysis based on liveness analysis. They argue that the vast majority of points-to pairs calculated by existing algorithms are never used by any client analysis or transformation because they involve dead variables. They reformulate a flowand contextsensitive points-to analysis in terms of a joint points-to and liveness analysis so that potentially unused points-to relations will not be computed. Concrete semantics We assume that a naming scheme can be interfaced with a function B P Ref Ñ H (2.7) In this presentation, we use a simple and standard naming scheme to name heap elements after the program point of the statement that allocates them (which is typical for the context-insensitive variant of points-to analysis). The elements of H will also be called allocation sites or abstract references. Let Var p, Ref , and Fldp be the set of pointer variables, references, and fields for pointer references. A state σ of the store-based semantics is a pair of partial mappings ρ from Var p to Ref , called environments, and partial mappings ~ from Ref ˆFldp to Ref , called stores. The store-based semantics domain will be denoted by Pter . Pter fi tpρ, ~q | ρ P Var p Ñ Ref K, ~ P Ref ˆ Fldp Ñ Ref Ku Given pρ, ~q P Pter , we say r P Ref is reachable if there exists x P Var p such that ρpxq = r, or there exists some reachable r1 P R and f P Fldp 29 te l-0 09 18 59 3, v er si on 1 13 D ec 2 01 3 CHAPTER 2. BACKGROUND s.t. ~pr1, fq = r. The state pρ, ~q is called garbage-free if each reference in tr P Ref | pr, fq P domp~qu is reachable. The concrete semantics domain is defined to be the collection of subsets of garbage-free states in Pter . The effect of a statement sp of WHILEp can be modeled as the operational semantics on Pter . We write xsp, σy Pter ÝÑ σ1 if σ is the state before sp then σ1 can be a state after sp under the condition that sp terminates. Since we are only interested in garbage-free states, below we assume the operator of garbage collection is available, denoted by gc xx = null, pρ, ~qy Pter ÝÑ gcpρrxÑ Ks, ~q (2.8) xx = new, pρ, ~qy Pter ÝÑ gcpρrxÑ rfreshs, ~q for rfresh R reachablepρ, ~q (2.9) xx = y, pρ, ~qy Pter ÝÑ gcpρrxÑ ρpyqs, ~q (2.10) xx = y.f, pρ, ~qy Pter ÝÑ gcpρrxÑ ~pρpyq, fqs, ~q for pρpyq, fq P domp~q (2.11) xx.f = y, pρ, ~qy Pter ÝÑ gcpρ, ~rpρpxq, fq Ñ ρpyqsq for ρpxq ‰ K (2.12) xx == y, pst, ~qy Pter ÝÑ gcpρ, ~q for ρpxq = ρpyq (2.13) xx ‰ y, pst, ~qy Pter ÝÑ gcpρ, ~q for ρpxq ‰ ρpyq (2.14) The concrete semantics is defined to be the powerset lifting of the operational semantics. Abstract semantics Let p7 be a graph-like data structure composed of two kinds of arcs: x Ñ h and h1 f Ý Ñ h, where x and f range over variables and fields in WHILEp and h 1, h range over abstract references of the underlined naming scheme. Let us call this data structure points-to graph and denote by arcpp7q for the set of its arcs. Given a set of concrete states p̃ P ℘pPterq, its abstraction can be processed as follows: whenever there exists an pρ, ~q P p̃ s.t. ρpxq = r and r B h for some variable x, concrete reference r and abstract reference h, there must be an arc x Ñ h in the points-to graph; if ~pr1, fq = r for some concrete state pρ, ~q and r1 B h1 and rB h for abstract references h, h1 , then there must be an arc h1 f Ý Ñ h in the points-to graph. Because H, Fldp, Var p are assumedsemantics Let p7 be a graph-like data structure composed of two kinds of arcs: x Ñ h and h1 f Ý Ñ h, where x and f range over variables and fields in WHILEp and h 1, h range over abstract references of the underlined naming scheme. Let us call this data structure points-to graph and denote by arcpp7q for the set of its arcs. Given a set of concrete states p̃ P ℘pPterq, its abstraction can be processed as follows: whenever there exists an pρ, ~q P p̃ s.t. ρpxq = r and r B h for some variable x, concrete reference r and abstract reference h, there must be an arc x Ñ h in the points-to graph; if ~pr1, fq = r for some concrete state pρ, ~q and r1 B h1 and rB h for abstract references h, h1 , then there must be an arc h1 f Ý Ñ h in the points-to graph. Because H, Fldp, Var p are assumed 30 te l-0 09 18 59 3, v er si on 1 13 D ec 2 01 3 CHAPTER 2. BACKGROUND to be finite set, there esists a smallest points-to graph that abstracts a given subset of Pter . The set of these points-to graphs can be defined as Pter 7 fi pVar p ˆ Fldpq ˆ pH ˆ Fldp ˆHq It can be shown that this smallest, or called best abstraction in the terminology of abstract interpretation, consists of a points-to graph without garbage. By abuse of language, we write Pter 7 for garbage-free points-to graphs. The relationship between Pter 7 and its concrete counterpart ℘pPterq can be formalized with the concretization function γp defined as γpppq fi tpρ, ~q P Pter | ρpxq = r, r B hñ xÑ h P arcpρ, ~q ~pr1, fq = r ^ r1 B h1 ^ r B hñ h1 f Ý Ñ h P arcpρ, ~qu (2.15) The abstract semantics of points-to analysis is usually specified in the style of constraints system. Below, let p7 Q a a shortcut for a P arcpp7q. The constraint system can be specified as x=y p7 Q y Ñ h p7 Q xÑ h x=y.f p7 Q y Ñ h1 p7 Q h1 f Ý Ñ h p7 Q xÑ h (2.16) x=new p7 Q xÑ rfresh x.f=y p7 Q xÑ h1 p7 Q x f Ý Ñ h p7 Q h1 f Ý Ñ h (2.17)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Static debugging of C programs: detection of pointer errors in recursive data structures

The incorrect use of pointers is one of the most common source of bugs in imperative languages. In this context, any kind of static code checking capable of detecting potential bugs at compile time is welcome. This paper presents a static debugging technique for the detection of incorrect accesses to memory (dereferences of invalid pointers). The analysed language is a subset of C. The tool is ...

متن کامل

Différentiation automatique et formes de Taylor en analyse statique de programmes numériques

Résumé Des travaux récents sur l’analyse statique de programmes numériques ont montré que les techniques d’interprétation abstraite étaient adaptées à la validation de la précision des calculs en arithmétique flottante. L’utilisation des intervalles comme domaine numérique, même avec des méthodes de subdivision, induit une sur-approximation des résultats en particulier par l’existence de l’effe...

متن کامل

Analyse statique de programmes et systèmes numériques

Alors que la complexité des traitements informatiques dans les systèmes embarqués croit chaque jour, la sûreté du fonctionnement de ces systèmes devient un enjeu crucial, tout particulièrement pour les systèmes critiques. Plus largement, synthétiser et garantir des propriétés sur le comportement d’un programme, est un objectif naturel et de large utilité. Cette habilitation (Putot, 2012) porte ...

متن کامل

An inf-sup stable and robust discretization of the Stokes equations with large irrotational sources on general meshes

In this note we propose a discretization of the Stokes equations which is inf-sup stable on general polygonal or polyhedral meshes and robust with respect to the presence of large irrotational source terms. The key idea is to construct a discrete space for the velocity which extends two important properties of the Crouzeix–Raviart element to general meshes, namely the continuity of mean values ...

متن کامل

Dioïdes et idéaux de polynômes en analyse statique. (Static analysis with dioids and polynomial ideals)

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013